Testing for Differences Among Discrete Distributions: An Application of Model-Based Clustering
نویسندگان
چکیده
We consider the problem of testing for differences among a number of discrete distributions. Our approach is based on viewing the different samples as drawn from a mixture distribution where each mixture component represents a different group or cluster. We adapt the methodology of Dasgupta and Raftery (1998) to this problem, giving explicit details in the case of multinomial and first-order Markov chain distributions. The resulting method is computationally efficient and easy to implement. Applications to two datasets are discussed.
منابع مشابه
Classification and properties of acyclic discrete phase-type distributions based on geometric and shifted geometric distributions
Acyclic phase-type distributions form a versatile model, serving as approximations to many probability distributions in various circumstances. They exhibit special properties and characteristics that usually make their applications attractive. Compared to acyclic continuous phase-type (ACPH) distributions, acyclic discrete phase-type (ADPH) distributions and their subclasses (ADPH family) have ...
متن کاملTesting Several Rival Models Using the Extension of Vuong\'s Test and Quasi Clustering
The two main goals in model selection are firstly introducing an approach to test homogeneity of several rival models and secondly selecting a set of reasonable models or estimating the best rival model to the true one. In this paper we extend Vuong's method for several models to cluster them. Based on the working paper of Katayama $(2008)$, we propose an approach to test whether rival models h...
متن کاملAn improved opposition-based Crow Search Algorithm for Data Clustering
Data clustering is an ideal way of working with a huge amount of data and looking for a structure in the dataset. In other words, clustering is the classification of the same data; the similarity among the data in a cluster is maximum and the similarity among the data in the different clusters is minimal. The innovation of this paper is a clustering method based on the Crow Search Algorithm (CS...
متن کاملAn Empirical Comparison of Distance Measures for Multivariate Time Series Clustering
Multivariate time series (MTS) data are ubiquitous in science and daily life, and how to measure their similarity is a core part of MTS analyzing process. Many of the research efforts in this context have focused on proposing novel similarity measures for the underlying data. However, with the countless techniques to estimate similarity between MTS, this field suffers from a lack of comparative...
متن کاملThe Family of Scale-Mixture of Skew-Normal Distributions and Its Application in Bayesian Nonlinear Regression Models
In previous studies on fitting non-linear regression models with the symmetric structure the normality is usually assumed in the analysis of data. This choice may be inappropriate when the distribution of residual terms is asymmetric. Recently, the family of scale-mixture of skew-normal distributions is the main concern of many researchers. This family includes several skewed and heavy-tailed d...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006